knitr document van Steensel lab

Introduction

I sequenced the complete insert of the pDNA library of pMT06. I already extracted all sequences in front of the 3’ adapter from the sequences data and added counts to identical sequences by starcode. I now want to make an overview about how many pDNA insert sequences in the pDNA still match the designed inserts.

Data import

Analysis

What is the barcode distribution of mapped vs. unmapped for both TFs?

Correlate to GC contenct

Plot how many barcodes are found in pDNA data

How many raw complete sequences match with the design?

Identify those barcodes that are attached to a wrong insert

Clearly wrongly assigned barcodes can be assigned to the correct insert Barcodes that are attached to a mixed population of inserts should to be excluded from any analysis where this plasmid library was used

Barcode re-evaluation

Investigate the mutational load of the barcodes with a good match

Exporting data

# # Export barcodes that are attached to multiple inserts
# bc_exclude <- matching_df_exclude$barcode %>% unique()
# write.csv(bc_exclude, "/DATA/usr/m.trauernicht/projects/tf_activity_reporter/data/SuRE_TF_1/pDNA_seq/bc_exclude.csv")
# 
# # Export barcodes that are attached to the wrong insert
# bc_replace <- pDNA_seq_incorrect %>% dplyr::select(barcode, `bc-match`, `insert-match`) %>% unique()
# write.csv(bc_replace, "/DATA/usr/m.trauernicht/projects/tf_activity_reporter/data/SuRE_TF_1/pDNA_seq/bc_replace.csv")

Session Info

paste("Run time: ",format(Sys.time()-StartTime))
## [1] "Run time:  50.97526 secs"
getwd()
## [1] "/DATA/usr/m.trauernicht/projects/SuRE_deep_scan_trp53_gr/pDNA_insert_seq"
date()
## [1] "Wed Dec  9 14:20:37 2020"
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.7 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] shiny_1.4.0                 tibble_3.0.1               
##  [3] plotly_4.9.2.1              LncFinder_1.1.4            
##  [5] sunburstR_2.1.4             d3r_0.9.0                  
##  [7] vwr_0.3.0                   latticeExtra_0.6-29        
##  [9] lattice_0.20-38             stringdist_0.9.5.5         
## [11] ggbeeswarm_0.6.0            ggplot2_3.3.0              
## [13] dplyr_0.8.5                 readr_1.3.1                
## [15] tidyr_1.0.0                 phylotools_0.2.2           
## [17] ape_5.4-1                   maditr_0.6.3               
## [19] plyr_1.8.6                  ShortRead_1.42.0           
## [21] GenomicAlignments_1.20.1    SummarizedExperiment_1.14.1
## [23] DelayedArray_0.10.0         matrixStats_0.55.0         
## [25] Biobase_2.44.0              Rsamtools_2.0.3            
## [27] GenomicRanges_1.36.1        GenomeInfoDb_1.20.0        
## [29] Biostrings_2.52.0           XVector_0.24.0             
## [31] IRanges_2.18.3              S4Vectors_0.22.1           
## [33] BiocParallel_1.18.1         BiocGenerics_0.30.0        
## [35] seqinr_3.6-1               
## 
## loaded via a namespace (and not attached):
##  [1] colorspace_1.4-1       hwriter_1.3.2          ellipsis_0.3.0        
##  [4] class_7.3-15           farver_2.0.1           prodlim_2019.11.13    
##  [7] lubridate_1.7.4        codetools_0.2-16       splines_3.6.3         
## [10] knitr_1.30             ade4_1.7-13            jsonlite_1.7.1        
## [13] pROC_1.16.1            caret_6.0-85           png_0.1-7             
## [16] compiler_3.6.3         httr_1.4.1             fastmap_1.0.1         
## [19] assertthat_0.2.1       Matrix_1.2-18          lazyeval_0.2.2        
## [22] later_1.1.0.1          htmltools_0.5.0        tools_3.6.3           
## [25] gtable_0.3.0           glue_1.4.2             GenomeInfoDbData_1.2.1
## [28] reshape2_1.4.4         Rcpp_1.0.5             vctrs_0.2.4           
## [31] nlme_3.1-143           crosstalk_1.0.0        iterators_1.0.12      
## [34] timeDate_3043.102      gower_0.2.1            xfun_0.19             
## [37] stringr_1.4.0          mime_0.9               lifecycle_0.2.0       
## [40] zlibbioc_1.30.0        MASS_7.3-51.5          scales_1.1.0          
## [43] ipred_0.9-9            promises_1.1.1         hms_0.5.3             
## [46] RColorBrewer_1.1-2     yaml_2.2.1             rpart_4.1-15          
## [49] stringi_1.5.3          foreach_1.4.7          e1071_1.7-4           
## [52] lava_1.6.6             rlang_0.4.8            pkgconfig_2.0.3       
## [55] bitops_1.0-6           evaluate_0.14          purrr_0.3.3           
## [58] labeling_0.3           recipes_0.1.9          htmlwidgets_1.5.2     
## [61] tidyselect_1.1.0       magrittr_1.5           R6_2.5.0              
## [64] generics_0.0.2         pillar_1.4.3           withr_2.1.2           
## [67] prettydoc_0.4.0        survival_3.1-8         RCurl_1.95-4.12       
## [70] nnet_7.3-12            crayon_1.3.4           rmarkdown_2.5         
## [73] jpeg_0.1-8.1           grid_3.6.3             data.table_1.12.8     
## [76] ModelMetrics_1.2.2.1   digest_0.6.27          xtable_1.8-4          
## [79] httpuv_1.5.4           munsell_0.5.0          beeswarm_0.2.3        
## [82] viridisLite_0.3.0      vipor_0.4.5